Project Closing Data Visualization

Team Algoritma

29, June 2020

What we have learned?

Programming For Data Science

Analyze. Share. Reproduce. with R Markdown

Your data tells a story. Tell it with R Markdown. Turn your analyses into high quality documents, reports, presentations and dashboards.

R Markdown supports dozens of static and dynamic output formats including HTML ⧉, PDF ⧉, MS Word ⧉, Beamer ⧉, HTML5 slides ⧉, Tufte-style handouts ⧉, books ⧉, dashboards, shiny applications ⧉, scientific articles ⧉, websites ⧉, and more.

See gallery

Practical statistics

  • Descriptive Statistics
  • Inferential statistics

Data Visualization in R

  • The Goal of Visualization

Exploratory

to uncover a relationship in the data
to analyze data

Explanatory

to communicate a relationship in the data
to present data

  • Data with same statistics can be vastly different.
Youtube Trending US, 2018
trending_date channel_title category_id views likes
2017-11-14 CaseyNeistat People and Blogs 748374 57527
2017-11-14 LastWeekTonight Entertainment 2418783 97185
2017-11-14 Rudy Mancuso Comedy 3191434 146033
2017-11-14 Good Mythical Morning Entertainment 343168 10172
2017-11-14 nigahiga Entertainment 2095731 132235

Whenever we visualize, we are encoding data using visual cues, or mapping data onto variation in size, shape or color, and so on:



Interactive Plotting

  • Build web app dashboard

Motivational: Why Learn R?

Open Source

Part of the reason for its active and rapidly growing community is the open-source nature of R. Users can contribute packages–many of which packaged some of the most advanced statistical tools that are not found in other commercial, proprietary statistical computing softwares.

R does not involve lots of pointing and clicking, and that’s a good thing

The learning curve might be steeper than with other software, but with R, the results of your analysis do not rely on remembering a succession of pointing and clicking, but instead on a series of written commands, and that’s a good thing!

R works on data of all shapes and sizes

The skills you learn with R scale easily with the size of your dataset. Whether your dataset has hundreds or millions of lines, it won’t make much difference to you. R is designed for data analysis. It comes with special data structures and data types that make handling of missing data and statistical factors convenient. R can connect to spreadsheets, databases, and many other data formats, on your computer or on the web

Used by Biggest Software Companies in the World

The BBC data team, developed an R package and an R cookbook to make the process of creating publication-ready graphics in their style-guide using R’s ggplot2 library a more reproducible process, as well as making it easier for people new to R to create graphics.

R is used by Google to calculate ROI on advertising campaigns and estimate causal effect (say, estimate the impact of an app feature on app downloads or number of additional sales from an AdWords campaign); In fact, it even released its own R packages to allow other R users to do similar analysis using the same tool (see CausalImpact). Data Science employees at Google participate in User Groups to discuss how R is used in Google, publishing its own R client for the Google Prediction API, Google’s R style guide, and its developers have released a number of R packages over the years.

Ready for Big Data

The Spark interface to R (sparklyr), Microsoft R Open, some parallelization extension, and a handful of other toolkits adds powerful big data support, allowing data engineers to create custom parallel and distributed algorithms to handle parallel / map-reduce programming in R. This makes R a popular choice for big data analyses, and high performance, enterprise-level analytics platform.

How to learn more after the workshop?

Use the built in Rstudio help interface to search for more information on R function

I’m stuck, I get an error message that I don’t understand 😢

Start by googling the error message. However, this doesn’t always work very well because often, package developers rely on the error catching provided by R. You end up with general error messages that might not be very helpful to diagnose a problem (e.g. “subscript out of bounds”).

However, you should check Stack Overflow. Search using the [r] tag. Most questions have already been answered, but the challenge is to use the right words in the search to find the answers

Hone your skill of data wrangling and visualization by participating on #TidyTuesday on twitter

Visit the github
Explore various kind of Interactive Table and Plotting packages in R

Explore shiny and it’s variant

Happy Learning and Coding! 😁